Customer Churn, an event indicating a customerabandoning an established relation with a business is an importantproblem researched well both in academic and commercialinterest. Through this work, we propose an improved predictionmodel that emphasizes on an effective data collection pipelinethrough varied channels capturing explicit and implicit customerfootprints. Our goal is to demonstrate how Feature selectionalgorithms can improve classifier efficiency. We also rank prominentfeatures which play a vital role in customer churn. Ourcontributions through this paper can be broadly categorizedinto 3 folds: First, we show how popular data mining tools inHadoop stack help extract several implicit customer interactionmetrics including Sales and Clickstream logs generated as a resultof customer interaction. Second, through Feature Engineeringtechniques we verify that some of the new features we proposehave a definite impact on customer churn. Finally, we establishhow Regularized Logistic Regression, SVM and Gradient BoostRandom Forests are the best performing models for predictingcustomer churn verified through comprehensive cross-validationtechniques.
展开▼